Search for: All records

Creators/Authors contains: "Hancock, ed., John"

« Prev Next »

Total Resources

15

Resource Type
Conference Paper

0

Conference Proceeding

0

Dataset

0

Journal Article

15

Workshop Report

0

Availability
Full Text / Resource Available

15

Citation Only

0

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Identifying antimicrobial peptides using word embedding with deep recurrent neural networks

https://doi.org/10.1093/bioinformatics/bty937

Hamid, Md-Nafiz ; Friedberg, Iddo ; Hancock, ed., John ( November 2018 , Bioinformatics)

Abstract Motivation
Antibiotic resistance constitutes a major public health crisis, and finding new sources of antimicrobial drugs is crucial to solving it. Bacteriocins, which are bacterially produced antimicrobial peptide products, are candidates for broadening the available choices of antimicrobials. However, the discovery of new bacteriocins by genomic mining is hampered by their sequences’ low complexity and high variance, which frustrates sequence similarity-based searches.
Results
Here we use word embeddings of protein sequences to represent bacteriocins, and apply a word embedding method that accounts for amino acid order in protein sequences, to predict novel bacteriocins from protein sequences without using sequence similarity. Our method predicts, with a high probability, six yet unknown putative bacteriocins in Lactobacillus. Generalized, the representation of sequences with word embeddings preserving sequence order information can be applied to peptide and protein classification problems for which sequence similarity cannot be used.
Availability and implementation
Data and source code for this project are freely available at: https://github.com/nafizh/NeuBI.
Supplementary information
Supplementary data are available at Bioinformatics online.

more » « less
New algorithms for detecting multi-effect and multi-way epistatic interactions

https://doi.org/10.1093/bioinformatics/btz463

Ansarifar, Javad ; Wang, Lizhi ; Hancock, ed., John ( June 2019 , Bioinformatics)

Abstract Motivation
Epistasis, which is the phenomenon of genetic interactions, plays a central role in many scientific discoveries. However, due to the combinatorial nature of the problem, it is extremely challenging to decipher the exact combinations of genes that trigger the epistatic effects. Many existing methods only focus on two-way interactions. Some of the most effective methods used machine learning techniques, but many were designed for special case-and-control studies or suffer from overfitting. We propose three new algorithms for multi-effect and multi-way epistases detection, with one guaranteeing global optimality and the other two being local optimization oriented heuristics.
Results
The computational performance of the proposed heuristic algorithm was compared with several state-of-the-art methods using a yeast dataset. Results suggested that searching for the global optimal solution could be extremely time consuming, but the proposed heuristic algorithm was much more effective and efficient than others at finding a close-to-optimal solution. Moreover, it was able to provide biological insight on the exact configurations of epistases, besides achieving a higher prediction accuracy than the state-of-the-art methods.
Availability and implementation
Data source was publicly available and details are provided in the text.

more » « less
The choice of sequence homologs included in multiple sequence alignments has a dramatic impact on evolutionary conservation analysis

https://doi.org/10.1093/bioinformatics/bty523

Gil, Nelson ; Fiser, Andras ; Hancock, ed., John ( June 2018 , Bioinformatics)

Abstract Motivation
The analysis of sequence conservation patterns has been widely utilized to identify functionally important (catalytic and ligand-binding) protein residues for over a half-century. Despite decades of development, on average state-of-the-art non-template-based functional residue prediction methods must predict ∼25% of a protein’s total residues to correctly identify half of the protein’s functional site residues. The overwhelming proportion of false positives results in reported ‘F-Scores’ of ∼0.3. We investigated the limits of current approaches, focusing on the so-far neglected impact of the specific choice of homologs included in multiple sequence alignments (MSAs).
Results
The limits of conservation-based functional residue prediction were explored by surveying the binding sites of 1023 proteins. A straightforward conservation analysis of MSAs composed of randomly selected homologs sampled from a PSI-BLAST search achieves average F-Scores of ∼0.3, a performance matching that reported by state-of-the-art methods, which often consider additional features for the prediction in a machine learning setting. Interestingly, we found that a simple combinatorial MSA sampling algorithm will in almost every case produce an MSA with an optimal set of homologs whose conservation analysis reaches average F-Scores of ∼0.6, doubling state-of-the-art performance. We also show that this is nearly at the theoretical limit of possible performance given the agreement between different binding site definitions. Additionally, we showcase the progress in this direction made by Selection of Alignment by Maximal Mutual Information (SAMMI), an information-theory-based approach to identifying biologically informative MSAs. This work highlights the importance and the unused potential of optimally composed MSAs for conservation analysis.
Supplementary information
Supplementary data are available at Bioinformatics online.

more » « less
De novo haplotype reconstruction in viral quasispecies using paired-end read guided path finding

https://doi.org/10.1093/bioinformatics/bty202

Chen, Jiao ; Zhao, Yingchao ; Sun, Yanni ; Hancock, ed., John ( April 2018 , Bioinformatics)

Abstract Motivation
RNA virus populations contain different but genetically related strains, all infecting an individual host. Reconstruction of the viral haplotypes is a fundamental step to characterize the virus population, predict their viral phenotypes and finally provide important information for clinical treatment and prevention. Advances of the next-generation sequencing technologies open up new opportunities to assemble full-length haplotypes. However, error-prone short reads, high similarities between related strains, an unknown number of haplotypes pose computational challenges for reference-free haplotype reconstruction. There is still much room to improve the performance of existing haplotype assembly tools.
Results
In this work, we developed a de novo haplotype reconstruction tool named PEHaplo, which employs paired-end reads to distinguish highly similar strains for viral quasispecies data. It was applied on both simulated and real quasispecies data, and the results were benchmarked against several recently published de novo haplotype reconstruction tools. The comparison shows that PEHaplo outperforms the benchmarked tools in a comprehensive set of metrics.
Availability and implementation
The source code and the documentation of PEHaplo are available at https://github.com/chjiao/PEHaplo.
Supplementary information
Supplementary data are available at Bioinformatics online.

more » « less
Deep learning improves antimicrobial peptide recognition

https://doi.org/10.1093/bioinformatics/bty179

Veltri, Daniel ; Kamath, Uday ; Shehu, Amarda ; Hancock, ed., John ( March 2018 , Bioinformatics)

Abstract Motivation
Bacterial resistance to antibiotics is a growing concern. Antimicrobial peptides (AMPs), natural components of innate immunity, are popular targets for developing new drugs. Machine learning methods are now commonly adopted by wet-laboratory researchers to screen for promising candidates.
Results
In this work, we utilize deep learning to recognize antimicrobial activity. We propose a neural network model with convolutional and recurrent layers that leverage primary sequence composition. Results show that the proposed model outperforms state-of-the-art classification models on a comprehensive dataset. By utilizing the embedding weights, we also present a reduced-alphabet representation and show that reasonable AMP recognition can be maintained using nine amino acid types.
Availability and implementation
Models and datasets are made freely available through the Antimicrobial Peptide Scanner vr.2 web server at www.ampscanner.com.
Supplementary information
Supplementary data are available at Bioinformatics online.

more » « less
Scaling read aligners to hundreds of threads on general-purpose processors

https://doi.org/10.1093/bioinformatics/bty648

Langmead, Ben ; Wilks, Christopher ; Antonescu, Valentin ; Charles, Rone ; Hancock, ed., John ( July 2018 , Bioinformatics)

Abstract Motivation
General-purpose processors can now contain many dozens of processor cores and support hundreds of simultaneous threads of execution. To make best use of these threads, genomics software must contend with new and subtle computer architecture issues. We discuss some of these and propose methods for improving thread scaling in tools that analyze each read independently, such as read aligners.
Results
We implement these methods in new versions of Bowtie, Bowtie 2 and HISAT. We greatly improve thread scaling in many scenarios, including on the recent Intel Xeon Phi architecture. We also highlight how bottlenecks are exacerbated by variable-record-length file formats like FASTQ and suggest changes that enable superior scaling.
Availability and implementation
Experiments for this study: https://github.com/BenLangmead/bowtie-scaling.
Bowtie
http://bowtie-bio.sourceforge.net .
Bowtie 2
http://bowtie-bio.sourceforge.net/bowtie2 .
HISAT
http://www.ccb.jhu.edu/software/hisat
Supplementary information
Supplementary data are available at Bioinformatics online.

more » « less
SciApps: a cloud-based platform for reproducible bioinformatics workflows

https://doi.org/10.1093/bioinformatics/bty439

Wang, Liya ; Lu, Zhenyuan ; Van Buren, Peter ; Ware, Doreen ; Hancock, ed., John ( June 2018 , Bioinformatics)

Abstract Motivation
The rapid accumulation of both sequence and phenotype data generated by high-throughput methods has increased the need to store and analyze data on distributed storage and computing systems. Efficient data management across these heterogeneous systems requires a workflow management system to simplify the task of analysis through automation and make large-scale bioinformatics analyses accessible and reproducible.
Results
We developed SciApps, a web-based platform for reproducible bioinformatics workflows. The platform is designed to automate the execution of modular Agave apps and support execution of workflows on local clusters or in a cloud. Two workflows, one for association and one for annotation, are provided as exemplar scientific use cases.
Availability and implementation
https://www.sciapps.org
Supplementary information
Supplementary data are available at Bioinformatics online.

more » « less
Misunderstood parameter of NCBI BLAST impacts the correctness of bioinformatics workflows

https://doi.org/10.1093/bioinformatics/bty833

Shah, Nidhi ; Nute, Michael G. ; Warnow, Tandy ; Pop, Mihai ; Hancock, ed., John ( September 2018 , Bioinformatics)
Genome Context Viewer: visual exploration of multiple annotated genomes using microsynteny

https://doi.org/10.1093/bioinformatics/btx757

Cleary, Alan ; Farmer, Andrew ; Hancock, ed., John ( November 2017 , Bioinformatics)

Abstract Summary
The Genome Context Viewer is a visual data-mining tool that allows users to search across multiple providers of genome data for regions with similarly annotated content that may be aligned and visualized at the level of their shared functional elements. By handling ordered sequences of gene family memberships as a unit of search and comparison, the user interface enables quick and intuitive assessment of the degree of gene content divergence and the presence of various types of structural events within syntenic contexts. Insights into functionally significant differences seen at this level of abstraction can then serve to direct the user to more detailed explorations of the underlying data in other interconnected, provider-specific tools.
Availability and implementation
GCV is provided under the GNU General Public License version 3 (GPL-3.0). Source code is available at https://github.com/legumeinfo/lis_context_viewer.
Supplementary information
Supplementary data are available at Bioinformatics online.

more » « less
Practical dynamic de Bruijn graphs

https://doi.org/10.1093/bioinformatics/bty500

Crawford, Victoria G. ; Kuhnle, Alan ; Boucher, Christina ; Chikhi, Rayan ; Gagie, Travis ; Hancock, ed., John ( June 2018 , Bioinformatics)

Abstract Motivation
The de Bruijn graph is fundamental to the analysis of next generation sequencing data and so, as datasets of DNA reads grow rapidly, it becomes more important to represent de Bruijn graphs compactly while still supporting fast assembly. Previous implementations of compact de Bruijn graphs have not supported node or edge deletion, however, which is important for pruning spurious elements from the graph.
Results
Belazzougui et al. (2016b) recently proposed a compact and fully dynamic representation, which supports exact membership queries and insertions and deletions of both nodes and edges. In this paper, we give a practical implementation of their data structure, supporting exact membership queries and fully dynamic edge operations, as well as limited support for dynamic node operations. We demonstrate experimentally that its performance is comparable to that of state-of-the-art implementations based on Bloom filters.
Availability and implementation
Our source-code is publicly available at https://github.com/csirac/dynamicDBG under an open-source license.

more » « less

« Prev Next »